22 research outputs found
Efficient Robustness Certificates for Discrete Data: Sparsity-Aware Randomized Smoothing for Graphs, Images and More
Existing techniques for certifying the robustness of models for discrete data
either work only for a small class of models or are general at the expense of
efficiency or tightness. Moreover, they do not account for sparsity in the
input which, as our findings show, is often essential for obtaining non-trivial
guarantees. We propose a model-agnostic certificate based on the randomized
smoothing framework which subsumes earlier work and is tight, efficient, and
sparsity-aware. Its computational complexity does not depend on the number of
discrete categories or the dimension of the input (e.g. the graph size), making
it highly scalable. We show the effectiveness of our approach on a wide variety
of models, datasets, and tasks -- specifically highlighting its use for Graph
Neural Networks. So far, obtaining provable guarantees for GNNs has been
difficult due to the discrete and non-i.i.d. nature of graph data. Our method
can certify any GNN and handles perturbations to both the graph structure and
the node attributes.Comment: Proceedings of the 37th International Conference on Machine Learning
(ICML 2020
Are GATs Out of Balance?
While the expressive power and computational capabilities of graph neural
networks (GNNs) have been theoretically studied, their optimization and
learning dynamics, in general, remain largely unexplored. Our study undertakes
the Graph Attention Network (GAT), a popular GNN architecture in which a node's
neighborhood aggregation is weighted by parameterized attention coefficients.
We derive a conservation law of GAT gradient flow dynamics, which explains why
a high portion of parameters in GATs with standard initialization struggle to
change during training. This effect is amplified in deeper GATs, which perform
significantly worse than their shallow counterparts. To alleviate this problem,
we devise an initialization scheme that balances the GAT network. Our approach
i) allows more effective propagation of gradients and in turn enables
trainability of deeper networks, and ii) attains a considerable speedup in
training and convergence time in comparison to the standard initialization. Our
main theorem serves as a stepping stone to studying the learning dynamics of
positive homogeneous models with attention mechanisms.Comment: 25 pages. To be published in Advances in Neural Information
Processing Systems (NeurIPS), 202
Adversarial Weight Perturbation Improves Generalization in Graph Neural Network
A lot of theoretical and empirical evidence shows that the flatter local
minima tend to improve generalization. Adversarial Weight Perturbation (AWP) is
an emerging technique to efficiently and effectively find such minima. In AWP
we minimize the loss w.r.t. a bounded worst-case perturbation of the model
parameters thereby favoring local minima with a small loss in a neighborhood
around them. The benefits of AWP, and more generally the connections between
flatness and generalization, have been extensively studied for i.i.d. data such
as images. In this paper, we extensively study this phenomenon for graph data.
Along the way, we first derive a generalization bound for non-i.i.d. node
classification tasks. Then we identify a vanishing-gradient issue with all
existing formulations of AWP and we propose a new Weighted Truncated AWP
(WT-AWP) to alleviate this issue. We show that regularizing graph neural
networks with WT-AWP consistently improves both natural and robust
generalization across many different graph learning tasks and models.Comment: AAAI 202
Are Defenses for Graph Neural Networks Robust?
A cursory reading of the literature suggests that we have made a lot of progress in designing effective adversarial defenses for Graph Neural Networks (GNNs). Yet, the standard methodology has a serious flaw – virtually all of the defenses are evaluated against non-adaptive attacks leading to overly optimistic robustness estimates. We perform a thorough robustness analysis of 7 of the most popular defenses spanning the entire spectrum of strategies, i.e., aimed at improving the graph, the architecture, or the training. The results are sobering – most defenses show no or only marginal improvement compared to an undefended baseline. We advocate using custom adaptive attacks as a gold standard and we outline the lessons we learned from successfully designing such attacks. Moreover, our diverse collection of perturbed graphs forms a (black-box) unit test offering a first glance at a model's robustness